Integration of heterogeneous language resources: A monolingual dictionary and a thesaurus
نویسندگان
چکیده
Linguistic knowledge plays a crucial role in natural language processing. Constructing large linguistic knowledge bases requires a lot of human effort and much cost. There have been many attempts to construct linguistic knowledge automatically, based on two primary strategies: knowledge extraction from annotated corpora and the augmentation of existing knowledge bases using annotated corpora. This paper describes an algorithm to enlarge existing linguistic knowledge through integration with heterogeneous linguistic resources. Specifically, this algorithm links a word sense defined in a monolingual dictionary to semantic classes in a thesaurus. Experiments show that we achieve a linking precision of 85.5% and coverage of 61.4%.
منابع مشابه
Building a Cross-lingual Relatedness Thesaurus using a Graph Similarity Measure
The Internet is an ever growing source of information stored in documents of different languages. Hence, cross-lingual resources are needed for more and more NLP applications. This paper presents (i) a graph-based method for creating one such resource and (ii) a resource created using the method, a cross-lingual relatedness thesaurus. Given a word in one language, the thesaurus suggests words i...
متن کاملAutomatic Selection and Ranking of Translation Candidates
We propose a method for selecting and ranking translation candidates using as, input disambiguated source language expressions with thesaurus-compatible senses. This procedure provides the means for choosing contextually appropriate translations automatically once the sense of the expression in the source language is known. Results can be stored to create a database where bilingual dictionary e...
متن کاملSemantic Evidence for Automatic Identification of Cognates
The identification of cognate word pairs has recently started to attract the attention of NLP research, but it is still a rather unexplored area requiring more focused attention. This paper builds on a purely orthographic approach to this task by introducing semantic evidence in the form of monolingual thesauri and corpora to support the identification process. The proposed method is easily por...
متن کاملEFL Translation Students' Perspective toward Using Bilingual Dictionary in Translation of Polysemous Words
This research presented the use of bilingual dictionary and addressed the EFL translation students' points of view on the use of bilingual dictionary in translating polysemous words (English to Persian). Moreo- ver, it aimed at finding the possible relationship between the effect of using bilingual dictionary by stu- dents in translating polysemous words and their achieved scores. In the study ...
متن کاملOntologising Relational Triples into a Portuguese Thesaurus
Having in mind the automatic acquisition and integration of knowledge from different heterogeneous resources, this paper proposes several automatic methods for attaching term-based relational triples to the synsets of a thesaurus, without exploiting the extraction context for disambiguation. After using the proposed methods to attach triples, extracted from a Portuguese dictionary, to the synse...
متن کامل